Search CORE

22 research outputs found

Development of the Slovak HMM-Based TTS System and Evaluation of Voices in Respect to the Used Vocoding Techniques

Author: Juhár Jozef
Rusko Milan
Sulír Martin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2017
Field of study

This paper describes the development of a Slovak text-to-speech system which applies a technique wherein speech is directly synthesized from hidden Markov models. Statistical models for Slovak speech units are trained by using the newly created female and male phonetically balanced speech corpora. In addition, contextual informations about phonemes, syllables, words, phrases, and utterances were determined, as well as questions for decision tree-based context clustering algorithms. In this paper, recent statistical parametric speech synthesis methods including the conventional, STRAIGHT and AHOcoder speech synthesis systems are implemented and evaluated. Objective evaluation methods (mel-cepstral distortion and fundamental frequency comparison) and subjective ones (mean opinion score and semantically unpredictable sentences test) are carried out to compare these systems with each other and evaluation of their overall quality. The result of this work is a set of text to speech systems for Slovak language which are characterized by very good intelligibility and quite good naturalness of utterances at the output of these systems. In the subjective tests of intelligibility the STRAIGHT based female voice and AHOcoder based male voice reached the highest scores

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Expressive Speech Synthesis for Critical Situations

Author: Darjaa Sakhia
Ritomský Marian
Rusko Milan
Sabo Róbert
Trnka Marián
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 11/02/2015
Field of study

Presence of appropriate acoustic cues of affective features in the synthesized speech can be a prerequisite for the proper evaluation of the semantic content by the message recipient. In the recent work the authors have focused on the research of expressive speech synthesis capable of generating naturally sounding synthetic speech at various levels of arousal. Automatic information and warning systems can be used to inform, warn, instruct and navigate people in dangerous, critical situations, and increase the effectiveness of crisis management and rescue operations. One of the activities in the frame of the EU SF project CRISIS was called "Extremely expressive (hyper-expressive) speech synthesis for urgent warning messages generation''. It was aimed at research and development of speech synthesizers with high naturalness and intelligibility capable of generating messages with various expressive loads. The synthesizers will be applicable to generate public alert and warning messages in case of fires, floods, state security threats, etc. Early warning in relation to the situations mentioned above can be made thanks to fire and flood spread forecasting; modeling thereof is covered by other activities of the CRISIS project. The most important part needed for the synthesizer building is the expressive speech database. An original method is proposed to create such a database. The current version of the expressive speech database is introduced and first experiments with expressive synthesizers developed with this database are presented and discussed

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Voice Operated Information System in Slovak

Author: Jarina Roman
Juhár Jozef
Rozinaj Gregor
Rusko Milan
Trnka Marián
Čižmár Anton
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

Speech communication interfaces (SCI) are nowadays widely used in several domains. Automated spoken language human-computer interaction can replace human-human interaction if needed. Automatic speech recognition (ASR), a key technology of SCI, has been extensively studied during the past few decades. Most of present systems are based on statistical modeling, both at the acoustic and linguistic levels. Increased attention has been paid to speech recognition in adverse conditions recently, since noise-resistance has become one of the major bottlenecks for practical use of speech recognizers. Although many techniques have been developed, many challenges still have to be overcome before the ultimate goal -- creating machines capable of communicating with humans naturally -- can be achieved. In this paper we describe the research and development of the first Slovak spoken language dialogue system. The dialogue system is based on the DARPA Communicator architecture. The proposed system consists of the Galaxy hub and telephony, automatic speech recognition, text-to-speech, backend, transport and VoiceXML dialogue management modules. The SCI enables multi-user interaction in the Slovak language. Functionality of the SLDS is demonstrated and tested via two pilot applications, ``Weather forecast for Slovakia'' and ``Timetable of Slovak Railways''. The required information is retrieved from Internet resources in multi-user mode through PSTN, ISDN, GSM and/or VoIP network

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Prediction of Stress Level from Speech – from Database to Regressor

Author: Darjaa Sakhia
Rusko Milan
Sabo Róbert
Schaper Meilin
Stelkens-Kobsch Tim
Trnka Marián
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 31/01/2024
Field of study

The term stress can designate a number of situations and affective reactions. This work focuses on the immediate stress reaction caused by, for example, threat, danger, fear, or great concern. Could measuring stress from speech be a viable fast and non-invasive method? The article describes the development of a system predicting stress from voice – from the creation of the database, and preparation of the training data to the design and tests of the regressor. StressDat, an acted database of speech under stress in Slovak, was designed. After publishing the methodology during its development in [1], this work describes the final form, annotation, and basic acoustic analyses of the data. The utterances presenting various stress-inducing scenarios were acted at three intended stress levels. The annotators used a "stress thermometer" to rate the perceived stress in the utterance on a scale from 0 to 100. Thus, data with a resolution suitable for training the regressor was obtained. Several regressors were trained, tested and compared. On the test-set, the stress estimation works well (R square = 0.72, Concordance Correlation Coefficient = 0.83) but practical application will require much larger volumes of specific training data. StressDat was made publicly available

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Speaker Authorization for Air Traffic Control Security

Author: Darjaa Sakhia
Rusko Milan
Schaper Meilin
Stelkens-Kobsch Tim H.
Trnka Marián
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

The number of incidents in which unauthorized persons break into frequencies used by Air Traffic Controllers (ATCOs) and give false instructions to pilots, or transmit fake emergency calls, is a permanent and apparently grow�ing threat. One of the measures against such attacks could be to use automatic speaker recognition on the voice radio channel to disclose the potential unau�thorized speaker. This work describes the solution for a speaker authorization system in the Security of Air Transport Infrastructures of Europe (SATIE) project, presents the architecture of the system, gives details on training and testing proce�dures, analyses the influence of the number of authorized persons on the system’s performance and describes how the system was adapted to work on the radio channel

Institute of Transport Research:Publications

Corpus of Spoken Slovak Language

Author: Milan Rusko
Publication venue
Publication date
Field of study

Abstract. In this paper a short description of activities towards building a general speech corpus of spoken Slovak language is given. Different rôles and specific features of text corpus and speech corpus are investigated as well as the most frequent mistakes and misunderstandings of the concept of a speech corpus are mentioned. The concept of a big representative corpus of spoken language and its desired properties are presented. The paper gives an overview of the current state of the art in speech corpora all over the world. It explains the need for a national speech corpus and indicates some of the typical areas of research and applications taking advantage of the existence of such a corpus. The speech databases currently available in Slovakia are listed and the particularities of annotation structures of these databases are pointed out. The authors search for a general annotation structure suitable for the kind of speech corpus envisaged. Some of the basic concepts and technical solutions used in recording and computer aided annotation used for the existing speech corpora are described. The most significant problems standing in the way of building a big speech corpus are pointed out. Furthermore, a pilot version of a speech corpus is presented, containing several recordings and their orthographic transcription. Keywords: speech corpus, database, spoken speech, Slovak.

CiteSeerX

Using Speech Analysis and Stress Detection in ATC Voice Communication

Author: Finke Michael
Rusko Milan
Publication venue
Publication date: 01/01/2016
Field of study

Institute of Transport Research:Publications